Multi-pattern Matching with Wildcards

نویسندگان

  • Meng Zhang
  • Yi Zhang
  • Jijun Tang
  • Xiaolong Bai
چکیده

Multi-pattern matching with wildcards is to find all the occurrences of a set of patterns with wildcards in a text. This problem arises in various fields, such as computational biology and network security. But the problem is not extensively studied as the single pattern case and there is no efficient algorithm for this problem. In this paper, we present efficient algorithms based on the fast Fourier transform. Let P = {p, . . . , p} be a set of patterns with wildcards where the total length of patterns is |P |, and a text t of length n over alphabet a1, . . . , aσ . We present three algorithms for this problem where patterns are matched simultaneously. The first algorithm finds the matches of a small set of patterns in the text in O(n log |P |+ occ log k) time where occ is the total number of occurrences of P in t. The words used in the algorithm are of size kd2 lg σe+ ∑ k i=1 dlg |p|e bits. The second algorithm is based on a prime number encoding. It runs in time O(n logm + occ log k) where m is the length of the longest pattern in P . The algorithm uses words with kdlg(2mσ + k)e bits. The third one finds the occurrences of patterns in the text in time O(n log |P | log σ + occ log k) by computing the Hamming distance between patterns and the text. The algorithm uses words with ∑ k i=1 dlg |p|e bits. Moreover, we demonstrate an FFT implementation based on the modular arithmetic for machines with 64-bit word. Finally, we show that these algorithms can be easily parallelized, and the parallelized algorithms are given as well. Keywords-Algorithm; Multi-pattern matching; Wildcards; FFT.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Average-case Complexity of Pattern Matching with Wildcards

In this paper we present a number of fast average-case algorithms for pattern matching with wildcards. We consider the problems where wildcards are restricted to either the pattern or the text, however, the results can be easily adapted to the case where wildcards are allowed in both. We analyse the algorithms average-case complexity and their expected-case complexity and show new lower bounds ...

متن کامل

Research on Pattern Matching with Wildcards and Length Constraints: Methods and Completeness

© 2012 Wang et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Research on Pattern Matching with Wildcards and Length Constraints: Methods and Complet...

متن کامل

Error Tree: A Tree Structure for Hamming & Edit Distances & Wildcards Matching

Error Tree is a novel tree structure that is mainly oriented to solve the approximate pattern matching problems, Hamming and edit distances, as well as the wildcards matching problem. The input is a text of length n over a fixed alphabet of length Σ, a pattern of length m, and k. The output is to find all positions that have ≤ k Hamming distance, edit distance, or wildcards matching with P . Th...

متن کامل

A Simple Obfuscation Scheme for Pattern-Matching with Wildcards

We give a simple and efficient method for obfuscating pattern matching with wildcards. In other words, we construct a way to check an input against a secret pattern, which is described in terms of prescribed values interspersed with unconstrained “wildcard” slots. As long as the support of the pattern is sufficiently sparse and the pattern itself is chosen from an appropriate distribution, we p...

متن کامل

Streaming Pattern Matching with d Wildcards

In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol ′?′. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P . In the streaming model variant of the pattern matching with d wildcards problem ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JSW

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2011